Symptom Extraction Using LLMs on the Vlaams Supercomputer

Symptom Extraction Using LLMs on the Vlaams Supercomputer (VSC)

I completed a 3-month internship with the Data Science Institute (DSI), UHasselt, exploring how open-source large language models (LLMs) running on high-performance computing (HPC) can be used to extract clinical knowledge from de-identified data. Using the Vlaams Supercomputer (VSC), I built a reproducible pipeline that prompts an LLM to generate concise lists of common symptoms for diseases identified by ICD codes from MIMIC-IV and saves the results in tidy JSONL/Parquet files. The tutorial page covering a quick VSC refresher, environment setup, SLURM jobs, and the end-to-end workflow is here: See the full tutorial →. Learn more about the dataset: MIMIC-IV (PhysioNet).